Feature: Handle read only columns by driv3r · Pull Request #437 · Shopify/ghostferry

driv3r · 2026-04-16T11:35:19Z

ref: #400 & @proton-lisandro-pin

The back and forth is taking a bit of time, lets speed this up, I've cherry-picked your commits so the attribution is there, CLA was signed on original PR, if you could sign the one here as well it would be 👍

Handle MySQL Generated Columns (STORED and VIRTUAL) in Data Replication

This PR adds support for MySQL generated columns (both VIRTUAL and STORED) to Ghostferry, enabling proper handling of computed columns during selective data replication.

Problem Statement

MySQL 8.0.23 introduced significant changes to how generated columns are handled in binary log ROW events:

Virtual columns are completely omitted from binlog events (not stored on disk)
Stored columns are included in binlog events (computed once and persisted)

Without special handling, Ghostferry would fail or produce incorrect results when replicating tables with generated columns, as it would attempt to insert values into columns that cannot be modified or would have incorrect column positions.

Solution

This PR implements a comprehensive solution with four key components:

Row Expansion - Detects when MySQL omits virtual columns from binlog events and re-inserts nil sentinels to maintain consistent full-schema column indexing throughout the pipeline.
Insert Value Filtering - Filters out generated column values before constructing INSERT statements, allowing only modifiable columns to be inserted while using proper column metadata for value escaping.
Unsigned Integer Normalization - Fixed the order of operations: row expansion happens before unsigned integer normalization, ensuring consistent full-schema column indexing throughout.
Verification with Generated Columns - Includes all columns (including generated) in fingerprint queries to detect divergence when computed values differ between source and target databases.

Changes

Core Implementation

dml_events.go: Modified INSERT event handling to filter out generated columns and improved binlog event processing for MySQL 8.0.23+ compatibility
table_schema_cache.go: Added column classification and filtering utilities:
- IsColumnGenerated() - Identifies virtual and stored columns
- NonGeneratedColumnNames() - Returns only insertable columns
- FilterGeneratedColumnsOnRowData() - Removes generated column values from rows
row_batch.go: Updated row batch handling for filtered column data
iterative_verifier.go: Updated verification logic to handle generated columns

Test Coverage

Added unit tests for generated column handling with edge cases:
- Virtual columns before unsigned integer columns
- Generated columns before JSON columns (ensures JSON casting is preserved)
- Mixed VIRTUAL and STORED columns
Added integration tests confirming:
- Verification detects divergence in computed generated column values
- Stored and virtual generated columns are handled correctly
- Interrupt/resume scenarios work with generated columns

Edge Cases Handled

✅ Virtual columns before unsigned integer columns
✅ Generated columns before JSON columns (JSON casting preserved)
✅ Mixed VIRTUAL and STORED columns in same table
✅ MySQL version differences (pre-8.0.23 vs 8.0.23+)
✅ Interrupt/resume with generated columns
✅ Verification with computed column divergence detection

Testing

5 new commits with comprehensive testing
Tests for critical edge cases involving generated columns and other column types
Integration tests verifying both inline and checkpoint verification modes
Fixed race condition in interrupt/resume tests

Related Issue

Closes #338

This PR modifies all `INSERT` logic so virtual (a.k.a generated) MySQL columns are not attempted to insert into, which otherwise breaks the ferrying process. See also #338.

driv3r · 2026-04-16T19:31:40Z

Hey @plisandro ! There was a bunch more things to tackle here, everything should be in place now, feel free to review and test, as well as sign the CLA mentioned in the checks for the contributions 👍

ghost · 2026-04-17T08:51:46Z

Hey @plisandro ! There was a bunch more things to tackle here, everything should be in place now, feel free to review and test, as well as sign the CLA mentioned in the checks for the contributions 👍

This is much appreciated, thank you! Been testing it today and your PR seems to work well. CLA is now signed as well 😄

driv3r · 2026-04-17T09:19:33Z

Hey @plisandro I'm not super familiar with the CLA stuff, but the error says:

@plisandro: Sign the CLA and comment "I have signed the CLA!" to re-run the checks and have your PR reviewed.

leave a comment and lets see, also I think you may have committed under your original account, so you might need to leave/sign under it as well 🤔

plisandro · 2026-04-17T09:45:33Z

I have signed the CLA!

plisandro · 2026-04-17T09:46:24Z

leave a comment and lets see, also I think you may have committed under your original account, so you might need to leave/sign under it as well 🤔

Done, and apologies for the confusion - i'm in the process of merging accounts now, and this was actually the last contribution left with the old one 🤦

driv3r · 2026-04-17T11:15:13Z

@plisandro no problem!

milanatshopify · 2026-04-23T18:36:11Z

+
+// Evaluates whether a TableSchema column is generated, by name.
+func (t *TableSchema) IsColumnNameGenerated(name string) bool {
+	for _, col := range t.Columns {


Are we worried about iterating through columns every time? I know there can't be that many, so it's probably not an issue, but we do it in a few places in this file.

I tested this locally, and has no discernible runtime impact when compared to a (previous) version which filters columns beforehand.

As noted in #400 (comment) , the problem with doing any pre-processing on columns is that they won't later align with DML and Binlog updates, so the logic to handle these becomes a mess.

Sure, but is it worth it. See comment on row_batch.go

Note that "a mess" includes being a tad slower 😄 To process row updates with generated columns missing on table definitions means that you effectively need to keep track of those columns, and their indeces, to filter them out on every update - which also involves re-aligning row data.

Filtering generated columns when SQL is constructed is actually simpler/cheaper. Either way, i tested this in a setup with tables with 50+ columns and couldn't measure any significant difference.

milanatshopify · 2026-04-24T14:17:54Z

-			flattened[rowIdx*rowSize+colIdx] = col
+	flattened := make([]interface{}, 0, len(e.values)*len(e.columns))
+
+	for _, row := range e.values {


This is now going through all rows, all columns, then all columns again. #rows * #cols^2. Or am I reading it wrong?

Asking because this will add up when you have 86 columns in a large table, and we're doing string comparisons. no?

The original change followed TableSchema as it exists today, where ignored and compressed columns are indexed by name.

Didn't seem to have any relevant performance hit on my test setup, but if this is a consideration, we could use a hashmap instead. Note it'd have to be lazily-initialized, as TableSchema is initialized ad-hoc all over ghostferry.

plisandro added 2 commits April 16, 2026 13:33

Handle generated columns on ghostferry write operations.

023e0b5

This PR modifies all `INSERT` logic so virtual (a.k.a generated) MySQL columns are not attempted to insert into, which otherwise breaks the ferrying process. See also #338.

Improve filtering logic, fix integration tests.

e9fd3c1

driv3r self-assigned this Apr 16, 2026

driv3r added enhancement New feature or request go Pull requests that update Go code labels Apr 16, 2026

github-actions Bot added the cla-needed label Apr 16, 2026

driv3r mentioned this pull request Apr 16, 2026

Handle generated columns on ghostferry write operations. #400

Open

driv3r added 4 commits April 16, 2026 13:54

Add tests reproducing edge cases

696608c

Make things work and fix interrupt/resume test race condition

85be83c

Working feature

c0b7b00

Cleanup and gh-285 regression fix

89ee08f

driv3r marked this pull request as ready for review April 16, 2026 19:28

driv3r requested review from a team, austenLacy, forge33 and milanatshopify April 16, 2026 19:29

github-actions Bot removed the cla-needed label Apr 17, 2026

milanatshopify reviewed Apr 23, 2026

View reviewed changes

milanatshopify reviewed Apr 24, 2026

View reviewed changes

driv3r assigned grodowski Apr 30, 2026

Conversation

driv3r commented Apr 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Handle MySQL Generated Columns (STORED and VIRTUAL) in Data Replication

Problem Statement

Solution

Changes

Core Implementation

Test Coverage

Edge Cases Handled

Testing

Related Issue

Uh oh!

driv3r commented Apr 16, 2026

Uh oh!

ghost commented Apr 17, 2026

Uh oh!

driv3r commented Apr 17, 2026

Uh oh!

plisandro commented Apr 17, 2026

Uh oh!

plisandro commented Apr 17, 2026

Uh oh!

driv3r commented Apr 17, 2026

Uh oh!

milanatshopify Apr 23, 2026

Choose a reason for hiding this comment

Uh oh!

plisandro Apr 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

milanatshopify Apr 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

plisandro Apr 24, 2026

Choose a reason for hiding this comment

Uh oh!

milanatshopify Apr 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

milanatshopify Apr 24, 2026

Choose a reason for hiding this comment

Uh oh!

plisandro Apr 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

driv3r commented Apr 16, 2026 •

edited

Loading

plisandro Apr 23, 2026 •

edited

Loading

milanatshopify Apr 24, 2026 •

edited

Loading

milanatshopify Apr 24, 2026 •

edited

Loading

plisandro Apr 24, 2026 •

edited

Loading